Data aggregation module supporting dynamic query responsive aggregation during the servicing of database query requests provided by one or more client machines

ABSTRACT

Improved method of and apparatus for aggregating data elements in multidimensional databases (MDDB). In the preferred embodiment, the apparatus is realized in the form of a high-performance stand-alone (i.e. external) aggregation server which can be plugged-into conventional MOLAP systems to achieve significant improvements in system performance. In accordance with the principles of the present invention, the stand-alone aggregation server contains a scalable MDDB and a high-performance aggregation engine that are integrated into the modular architecture of the aggregation server. The stand-alone aggregation server of the present invention can uniformly distribute data elements among a plurality of processors, for balanced loading and processing, and therefore is highly scalable. The stand-alone aggregation server of the present invention can be used to realize (i) an improved MDDB for supporting on-line analytical processing (OLAP) operations, (ii) an improved Internet URL Directory for supporting on-line information searching operations by Web-enabled client machines, as well as (iii) diverse types of MDDB-based systems for supporting real-time control of processes in response to complex states of information reflected in the MDDB.

RELATED CASES

This is a Continuation of application Ser. No. 10/854,034 filed May 25,2004; which is a Continuation of application Ser. No. 10/153,164 filedMay 21, 2002, which is a Continuation of application Ser. No. 09/514,611filed Feb. 28, 2002, now U.S. Pat. No. 6,434,544; which is aContinuation-in-part of: copending application Ser. No. 09/368,241 filedAug. 4, 1999; each said application being commonly owned by HyperRollIsrael, Ltd., and incorporated herein by reference in its entirety.

BACKGROUND OF THE INVENTION

1. Field of Invention

The present invention relates to a method of and system for aggregatingdata elements in a multi-dimensional database (MDDB) supported upon acomputing platform and also to provide an improved method of and systemfor managing data elements within a MDDB during on-line analyticalprocessing (OLAP) operations.

2. Brief Description of the State of the Art

The ability to act quickly and decisively in today's increasinglycompetitive marketplace is critical to the success of organizations. Thevolume of information that is available to corporations is rapidlyincreasing and frequently overwhelming. Those organizations that willeffectively and efficiently manage these tremendous volumes of data, anduse the information to make business decisions, will realize asignificant competitive advantage in the marketplace.

Data warehousing, the creation of an enterprise-wide data store, is thefirst step towards managing these volumes of data. The Data Warehouse isbecoming an integral part of many information delivery systems becauseit provides a single, central location where a reconciled version ofdata extracted from a wide variety of operational systems is stored.Over the last few years, improvements in price, performance,scalability, and robustness of open computing systems have made datawarehousing a central component of Information Technology CITstrategies. Details on methods of data integration and constructing datawarehouses can be found in the white paper entitled “Data Integration:The Warehouse Foundation” by Louis Rolleigh and Joe Thomas.

Building a Data Warehouse has its own special challenges (e.g. usingcommon data model, common business dictionary, etc.) and is a complexendeavor. However, just having a Data Warehouse does not provideorganizations with the often-heralded business benefits of datawarehousing. To complete the supply chain from transactional systems todecision maker, organizations need to deliver systems that allowknowledge workers to make strategic and tactical decisions based on theinformation stored in these warehouses. These decision support systemsare referred to as On-Line Analytical Processing (OLAP) systems. OLAPsystems allow knowledge workers to intuitively, quickly, and flexiblymanipulate operational data using familiar business terms, in order toprovide analytical insight into a particular problem or line of inquiry.For example, by using an OLAP system, decision makers can “slice anddice” information along a customer (or business) dimension, and viewbusiness metrics by product and through time. Reports can be definedfrom multiple perspectives that provide a high-level or detailed view ofthe performance of any aspect of the business. Decision makers cannavigate throughout their database by drilling down on a report to viewelements at finer levels of detail, or by pivoting to view reports fromdifferent perspectives. To enable such full-functioned businessanalyses, OLAP systems need to (1) support sophisticated analyses, (2)scale to large numbers of dimensions, and (3) support analyses againstlarge atomic data sets. These three key requirements are discussedfurther below. Decision makers use key performance metrics to evaluatethe operations within their domain, and OLAP systems need to be capableof delivering these metrics in a user-customizable format. These metricsmay be obtained from the transactional databases precalculated andstored in the database, or generated on demand during the query process.Commonly used metrics include:

(1) Multidimensional Ratios (e.g. Percent to Total)

“Show me the contribution to weekly sales and category profit made byall items sold in the Northwest stores between July 1 and July 14.”

(2) Comparisons (e.g. Actual vs. Plan, this Period vs. Last Period)

“Show me the sales to plan percentage variation for this year andcompare it to that of the previous year to identify planningdiscrepancies.”

(3) Ranking and Statistical Profiles (e.g. Top N/Bottom N, 70/30,Quartiles) “Show me sales, profit and average call volume per day for my20 most profitable salespeople, who are in the top 30% of the worldwidesales.”

(4) Custom Consolidations

“Show me an abbreviated income statement by quarter for the last twoquarters for my Western Region operations.”

Knowledge workers analyze data from a number of different businessperspectives or dimensions. As used hereinafter, a dimension is anyelement or hierarchical combination of elements in a data model that canbe displayed orthogonally with respect to other combinations of elementsin the data model. For example, if a report lists sales by week,promotion, store, and department, then the report would be a slice ofdata taken from a four-dimensional data model.

Target marketing and market segmentation applications involve extractinghighly qualified result sets from large volumes of data. For example, adirect marketing organization might want to generate a targeted mailinglist based on dozens of characteristics, including purchase frequency,size of the last purchase, past buying trends, customer location, age ofcustomer, and gender of customer. These applications rapidly increasethe dimensionality requirements for analysis.

The number of dimensions in OLAP systems range from a few orthogonaldimensions to hundreds of orthogonal dimensions. Orthogonal dimensionsin an exemplary OLAP application might include Geography, Time, andProducts.

Atomic data refers to the lowest level of data granularity required foreffective decision making. In the case of a retail merchandisingmanager, “atomic data” may refer to information by store, by day, and byitem. For a banker, atomic data may be information by account, bytransaction, and by branch. Most organizations implementing OLAP systemsfind themselves needing systems that can scale to tens, hundreds, andeven thousands of gigabytes of atomic information.

As OLAP systems become more pervasive and are used by the majority ofthe enterprise, more data over longer time frames will be included inthe data store (i.e. data warehouse), and the size of the database willincrease by at least an order of magnitude. Thus, OLAP systems need tobe able to scale from present to near-future volumes of data.

In general, OLAP systems need to (1) support the complex analysisrequirements of decision-makers, (2) analyze the data from a number ofdifferent perspectives (i.e. business dimensions), and (3) supportcomplex analyses against large input (atomic-level) data sets from aData Warehouse maintained by the organization using a relationaldatabase management system (RDBMS).

Vendors of OLAP systems classify OLAP Systems as either Relational OLAP(ROLAP) or Multidimensional OLAP (MOLAP) based on the underlyingarchitecture thereof. Thus, there are two basic architectures forOn-Line Analytical Processing systems: The ROLAP Architecture, and theMOLAP architecture.

Overview of the Relational OLAP (ROLAP) System Architecture

The Relational OLAP (ROLAP) system accesses data stored in a DataWarehouse to provide OLAP analyses. The premise of ROLAP is that OLAPcapabilities are best provided directly against the relational database,i.e. the Data Warehouse.

The ROLAP architecture was invented to enable direct access of data fromData Warehouses, and therefore support optimization techniques to meetbatch window requirements and provide fast response times. Typically,these optimization techniques include application-level tablepartitioning, pre-aggregate inferencing, denormalization support, andthe joining of multiple fact tables.

As shown in FIG. 1A, a typical prior art ROLAP system has a three-tieror layer client/server architecture. The “database layer” utilizesrelational databases for data storage, access, and retrieval processes.The “application logic layer” is the ROLAP engine which executes themultidimensional reports from multiple users. The ROLAP engineintegrates with a variety of “presentation layers,” through which usersperform OLAP analyses.

After the data model for the data warehouse is defined, data fromon-line transaction-processing (OLTP) systems is loaded into therelational database management system (RDBMS). If required by the datamodel, database routines are run to pre-aggregate the data within theRDBMS. Indices are then created to optimize query access times. Endusers submit multidimensional analyses to the ROLAP engine, which thendynamically transforms the requests into SQL execution plans. The SQLexecution plans are submitted to the relational database for processing,the relational query results are cross-tabulated, and a multidimensionalresult data set is returned to the end user. ROLAP is a fully dynamicarchitecture capable of utilizing precalculated results when they areavailable, or dynamically generating results from atomic informationwhen necessary.

Overview of MOLAP System Architecture

Multidimensional OLAP (MOLAP) systems utilize a proprietarymultidimensional database (MDDB) to provide OLAP analyses. The mainpremise of this architecture is that data must be storedmultidimensionally to be accessed and viewed multidimensionally.

As shown in FIG. 1B, prior art MOLAP systems have an Aggregation, Accessand Retrieval module which is responsible for all data storage, access,and retrieval processes, including data aggregration (i.e.preaggregation) in the MDDB. As shown in FIG. 1B, the base data loaderis fed with base data, in the most detailed level, from the DataWarehouse, into the Multi-Dimensional Data Base (MDDB). On top of thebase data, layers of aggregated data are built-up by the Aggregationprogram, which is part of the Aggregation, Access and Retrieval module.As indicated in this figure, the application logic module is responsiblefor the execution of all OLAP requests/queries (e.g. ratios, ranks,forecasts, exception scanning, and slicing and dicing) of data withinthe MDDB. The presentation module integrates with the application logicmodule and provides an interface, through which the end users view andrequest OLAP analyses on their client machines which may be web-enabledthrough the infrastructure of the Internet. The client/serverarchitecture of a MOLAP system allows multiple users to access the samemultidimensional database (MDDB).

Information (i.e. basic data) from a variety of operational systemswithin an enterprise, comprising the Data Warehouse, is loaded into aprior art multidimensional database (MDDB) through a series of batchroutines. The Express™ server by the Oracle Corporation is exemplary ofa popular server which can be used to carry out the data loading processin prior art MOLAP systems. As shown in FIG. 2B an exemplary 3-D MDDB isschematically depicted, showing geography, time and products as the“dimensions” of the database. The multidimensional data of the MDDB isorganized in an array structure, as shown in FIG. 2C. Physically, theExpress™ server stores data in pages (or records) of an informationfile. Pages contain 512, or 2048, or 4096 bytes of data, depending onthe platform and release of the Express™ server. In order to look up thephysical record address from the database file recorded on a disk orother mass storage device, the Express™ server generates a datastructure referred to as a “Page Allocation Table (PAT)”. As shown inFIG. 2D, the PAT tells the Express™ server the physical record numberthat contains the page of data. Typically, the PAT is organized inpages. The simplest way to access a data element in the MDDB is bycalculating the “offset” using the additions and multiplicationsexpressed by a simple formula:Offset=Months+Product*(# of_Months)+City*(# of_Months*# of_Products)

During an OLAP session, the response time of a multidimensional query ona prior art MDDB depends on how many cells in the MDDB have to be added“on the fly”. As the number of dimensions in the MDDB increaseslinearly, the number of the cells in the MDDB increases exponentially.However, it is known that the majority of multidimensional queries dealwith summarized high level data. Thus, as shown in FIGS. 3A and 3B, oncethe atomic data (i.e. “basic data”) has been loaded into the MDDB, thegeneral approach is to perform a series of calculations in batch inorder to aggregate (i.e. pre-aggregate) the data elements along theorthogonal dimensions of the MDDB and fill the array structures thereof.For example, revenue figures for all retail stores in a particular state(i.e. New York) would be added together to fill the state level cells inthe MDDB. After the array structure in the database has been filled,integer-based indices are created and hashing algorithms are used toimprove query access times. Pre-aggregation of dimension D0 is alwaysperformed along the cross-section of the MDDB along the D0 dimension.

As shown in FIG. 3C 2, the primarily loaded data in the MDDB isorganized at its lowest dimensional hierarchy. As shown in FIGS. 3C1 and3C3, the results of the pre-aggregations are stored in the neighboringparts of the MDDB.

As shown in FIG. 3C 2, along the TIME dimension, weeks are theaggregation results of days, months are the aggregation results ofweeks, and quarters are the aggregation results of months. While notshown in the figures, along the GEOGRAPHY dimension, states are theaggregation results of cities, countries are the aggregation results ofstates, and continents are the aggregation results of countries. Bypre-aggregating (i.e. consolidating or compiling) all logical subtotalsand totals along all dimensions of the MDDB, it is possible to carry outreal-time MOLAP operations using a multidimensional database (MDDB)containing both basic (i.e. atomic) and pre-aggregated data. Once thiscompilation process has been completed, the MDDB is ready for use. Usersrequest OLAP reports by submitting queries through the OLAP Applicationinterface (e.g. using web-enabled client machines), and the applicationlogic layer responds to the submitted queries by retrieving the storeddata from the MDDB for display on the client machine.

Typically, in MDDB systems, the aggregated data is very sparse, tendingto explode as the number of dimension grows and dramatically slowingdown the retrieval process (as described in the report entitled“Database Explosion: The OLAP Report”,http://www.olapreport.com/DatabaseExplosion.htm, incorporated herein byreference). Quick and on line retrieval of queried data is critical indelivering on-line response for OLAP queries. Therefore, the datastructure of the MDDB, and methods of its storing, indexing and handlingare dictated mainly by the need of fast retrieval of massive and sparsedata.

Different solutions for this problem are disclosed in the following USpatents, each of which is incorporated herein by reference in itsentirety:

U.S. Pat. No. 5,822,751 “Efficient Multidimensional Data AggregationOperator Implementation”

U.S. Pat. No. 5,805,885 “Method And System For Aggregation Objects”

U.S. Pat. No. 5,781,896 “Method And System For Efficiently PerformingDatabase Table Aggregation Using An Aggregation Index”

U.S. Pat. No. 5,745,764 “Method And System For Aggregation Objects”

In all the prior art OLAP servers, the process of storing, indexing andhandling MDDB utilize complex data structures to largely improve theretrieval speed, as part of the querying process, at the cost of slowingdown the storing and aggregation. The query-bounded structure, that mustsupport fast retrieval of queries in a restricting environment of highsparcity and multi-hierarchies, is not the optimal one for fastaggregation.

In addition to the aggregation process, the Aggregation, Access andRetrieval module is responsible for all data storage, retrieval andaccess processes. The Logic module is responsible for the execution ofOLAP queries. The Presentation module intermediates between the user andthe logic module and provides an interface through which the end usersview and request OLAP analyses. The client/server architecture allowsmultiple users to simultaneously access the multidimensional database.

In summary, general system requirements of OLAP systems include: (1)supporting sophisticated analysis, (2) scaling to large number ofdimensions, and (3) supporting analysis against large atomic data sets.

MOLAP system architecture is capable of providing analyticallysophisticated reports and analysis functionality. However, requirements(2) and (3) fundamentally limit MOLAP's capability, because to beeffective and to meet end-user requirements, MOLAP databases need a highdegree of aggregation.

By contrast, the ROLAP system architecture allows the construction ofsystems requiring a low degree of aggregation, but such systems aresignificantly slower than systems based on MOLAP system architectureprinciples. The resulting long aggregation times of ROLAP systems imposesevere limitations on its volumes and dimensional capabilities.

The graphs plotted in FIG. 5 clearly indicate the computational demandsthat are created when searching an MDDB during an OLAP session, whereanswers to queries are presented to the MOLAP system, and answersthereto are solicited often under real-time constraints. However, priorart MOLAP systems have limited capabilities to dynamically create dataaggregations or to calculate business metrics that have not beenprecalculated and stored in the MDDB.

The large volumes of data and the high dimensionality of certain marketsegmentation applications are orders of magnitude beyond the limits ofcurrent multidimensional databases.

ROLAP is capable of higher data volumes. However, the ROLAParchitecture, despite its high volume and dimensionality superiority,suffers from several significant drawbacks as compared to MOLAP:

Full aggregation of large data volumes are very time consuming,otherwise, partial aggregation severely degrades the query response.

It has a slower query response

It requires developers and end users to know SQL

SQL is less capable of the sophisticated analytical functionalitynecessary for OLAP

ROLAP provides limited application functionality

Thus, improved techniques for data aggregation within MOLAP systemswould appear to allow the number of dimensions of and the size of atomic(i.e. basic) data sets in the MDDB to be significantly increased, andthus increase the usage of the MOLAP system architecture.

Also, improved techniques for data aggregation within ROLAP systemswould appear to allow for maximized query performance on large datavolumes, and reduce the time of partial aggregations that degrades queryresponse, and thus generally benefit ROLAP system architectures.

Thus, there is a great need in the art for an improved way of and meansfor aggregating data elements within a multi-dimensional database(MDDB), while avoiding the shortcomings and drawbacks of prior artsystems and methodologies.

SUMMARY AND OBJECTS OF PRESENT INVENTION

Accordingly, it is a further object of the present invention to providean improved method of and system for managing data elements within amultidimensional database (MDDB) using a novel stand-alone (i.e.external) data aggregation server, achieving a significant increase insystem performance (e.g. deceased access/search time) using astand-alone scalable data aggregation server.

Another object of the present invention is to provide such system,wherein the stand-alone aggregation server includes an aggregationengine that is integrated with an MDDB, to provide a cartridge-styleplug-in accelerator which can communicate with virtually anyconventional OLAP server.

Another object of the present invention is to provide such a stand-alonedata aggregration server whose computational tasks are restricted todata aggregation, leaving all other OLAP functions to the MOLAP serverand therefore complementing OLAP server's functionality.

Another object of the present invention is to provide such a system,wherein the stand-alone aggregation server carries out an improvedmethod of data aggregation within the MDDB which enables the dimensionsof the MDDB to be scaled up to large numbers and large atomic (i.e.base) data sets to be handled within the MDDB.

Another object of the present invention is to provide such a stand-aloneaggregration server, wherein the aggregation engine supportshigh-performance aggregation (i.e. data roll-up) processes to maximizequery performance of large data volumes, and to reduce the time ofpartial aggregations that degrades the query response.

Another object of the present invention is to provide such astand-alone, external scalable aggregation server, wherein itsintegrated data aggregation (i.e. roll-up) engine speeds up theaggregation process by orders of magnitude, enabling larger databaseanalysis by lowering the aggregation times.

Another object of the present invention is to provide such a novelstand-alone scalable aggregation server for use in OLAP operations,wherein the scalability of the aggregation server enables (i) the speedof the aggregation process carried out therewithin to be substantiallyincreased by distributing the computationally intensive tasks associatedwith data aggregation among multiple processors, and (ii) the large datasets contained within the MDDB of the aggregation server to besubdivided among multiple processors thus allowing the size of atomic(i.e. basic) data sets within the MDDB to be substantially increased.

Another object of the present invention is to provide such a novelstand-alone scalable aggregation server, which provides for uniform loadbalancing among processors for high efficiency and best performance, andlinear scalability for extending the limits by adding processors.

Another object of the present invention is to provide a stand-alone,external scalable aggregation server, which is suitable for MOLAP aswell as for ROLAP system architectures.

Another object of the present invention is to provide a novelstand-alone scalable aggregation server, wherein an MDDB and aggregationengine are integrated and the aggregation engine carries out ahigh-performance aggregation algorithm and novel storing and searchingmethods within the MDDB.

Another object of the present invention is to provide a novelstand-alone scalable aggregation server which can be supported onsingle-processor (i.e. sequential or serial) computing platforms, aswell as on multi-processor (i.e. parallel) computing platforms.

Another object of the present invention is to provide a novelstand-alone scalable aggregation server which can be used as acomplementary aggregation plug-in to existing MOLAP and ROLAP databases.

Another object of the present invention is to provide a novelstand-alone scalable aggregation server which carries out an novelrollup (i.e. down-up) and spread down (i.e. top-down) aggregationalgorithms.

Another object of the present invention is to provide a novelstand-alone scalable aggregation server which includes an integratedMDDB and aggregation engine which carries out full pre-aggregationand/or “on-the-fly” aggregation processes within the MDDB.

Another object of the present invention is to provide such a novelstand-alone scalable aggregation server which is capable of supportingMDDB having a multi-hierarchy dimensionality.

Another object of the present invention is to provide a novel method ofaggregating multidimensional data of atomic data sets originating from aRDBMS Data Warehouse.

Another object of the present invention is to provide a novel method ofaggregating multidimensional data of atomic data sets originating fromother sources, such as external ASCII files, MOLAP server, or other enduser applications.

Another object of the present invention is to provide a novelstand-alone scalable data aggregation server which can communicate withany MOLAP server via standard ODBC, OLE DB or DLL interface, in acompletely transparent manner with respect to the (client) user, withoutany time delays in queries, equivalent to storage in MOLAP server'scache.

Another object of the present invention is to provide a novel“cartridge-style” (stand-alone) scalable data aggregation engine whichdramatically expands the boundaries of MOLAP into large-scaleapplications including Banking, Insurance, Retail and PromotionAnalysis.

Another object of the present invention is to provide a novel“cartridge-style” (stand-alone) scalable data aggregation engine whichdramatically expands the boundaries of high-volatility type ROLAPapplications such as, for example, the precalculation of data tomaximize query performance.

Another object of the present invention is to provide a generic plug-incartridge-type data aggregation component, suitable for all MOLAPsystems of different vendors, dramatically reducing their aggregationburdens.

Another object of the present invention is to provide a novel highperformance cartridge-type data aggregration server which, havingstandardized interfaces, can be plugged-into the OLAP system ofvirtually any user or vendor.

Another object of the present invention is to provide a novel“cartridge-style” (stand-alone) scalable data aggregation engine whichhas the capacity to convert long batch-type data aggregations intointeractive sessions.

These and other object of the present invention will become apparenthereinafter and in the Claims to Invention set forth herein.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more fully appreciate the objects of the present invention,the following Detailed Description of the Illustrative Embodimentsshould be read in conjunction with the accompanying Drawings, wherein:

FIG. 1A is a schematic representation of an exemplary prior artrelations on-line analytical processing (ROLAP) system comprising athree-tier or layer client/server architecture, wherein the first tierhas a database layer utilizing relational databases (RDBMS) for datastorage, access, and retrieval processes, the second tier has anapplication logic layer (i.e. the ROLAP engine) for executing themultidimensional reports from multiple users, and the third tierintegrates the ROLAP engine with a variety of presentation layers,through which users perform OLAP analyses;

FIG. 1B is a schematic representation of a generalized embodiment of aprior art multidimensional on-line analytical processing (MOLAP) systemcomprising a base data loader for receiving atomic (i.e. base) data froma Data Warehouse realized by a RDBMS, an OLAP multidimensional database(MDDB), an aggregation, access and retrieval module, application logicmodule and presentation module associated with a conventional OLAP sever(e.g. Oracle's Express Server) for supporting on-line transactionalprocessing (OLTP) operations on the MDDB, to service database queriesand requests from a plurality of OLAP client machines typicallyaccessing the system from an information network (e.g. the Internet);

FIG. 2A is a schematic representation of the Data Warehouse shown in theprior art system of FIG. 1B comprising numerous data tables (e.g. T1,T2, . . . Tn) and data field links, and the OLAP multidimensionaldatabase shown of FIG. 1B, comprising a conventional page allocationtable (PAT) with pointers pointing to the physical storage of variablesin an information storage device;

FIG. 2B is a schematic representation of an exemplary three-dimensionalMDDB and organized as a 3-dimensional Cartesian cube and used in theprior art system of FIG. 2A, wherein the first dimension of the MDDB isrepresentative of geography (e.g. cities, states, countries,continents), the second dimension of the MDDB is representative of time(e.g. days, weeks, months, years), the third dimension of the MDDB isrepresentative of products (e.g. all products, by manufacturer), and thebasic data element is a set of variables which are addressed by3-dimensional coordinate values;

FIG. 2C is a schematic representation of a prior art array structureassociated with an exemplary three-dimensional MDDB, arranged accordingto a dimensional hierarchy;

FIG. 2D is a schematic representation of a prior art page allocationtable for an exemplary three-dimensional MDDB, arranged according topages of data element addresses;

FIG. 3A is a schematic representation of a prior art MOLAP system,illustrating the process of periodically storing raw data in the RDBMSData Warehouse thereof, serially loading of basic data from the DataWarehouse to the MDDB, and the process of serially pre-aggregating (orpre-compiling) the data in the MDDB along the entire dimensionalhierarchy thereof;

FIG. 3B is a schematic representation illustrating that the Cartesianaddresses listed in a prior art page allocation table (PAT) point towhere physical storage of data elements (i.e. variables) occurs in theinformation recording media (e.g. storage volumes) associated with theMDDB, during the loading of basic data into the MDDB as well as duringdata preaggregation processes carried out therewithin;

FIG. 3C 1 is a schematic representation of an exemplarythree-dimensional database used in a conventional MOLAP system of theprior art, showing that each data element contained therein isphysically stored at a location in the recording media of the systemwhich is specified by the dimensions (and subdimensions within thedimensional hierarchy) of the data variables which are assignedinteger-based coordinates in the MDDB, and also that data elementsassociated with the basic data loaded into the MDDB are assigned lowerinteger coordinates in MDDB Space than pre-aggregated data elementscontained therewithin;

FIG. 3C 2 is a schematic representation illustrating that a conventionalhierarchy of the dimension of “time” typically contains thesubdimensions “days, weeks, months, quarters, etc.” of the prior art;

FIG. 3C 3 is a schematic representation showing how data elements havinghigher subdimensions of time in the MDDB of the prior art are typicallyassigned increased integer addresses along the time dimension thereof;

FIG. 4 is a schematic representation illustrating that, for very largeprior art MDDBs, very large page allocation tables (PATs) are requiredto represent the address locations of the data elements containedtherein, and thus there is a need to employ address data pagingtechniques between the DRAM (e.g. program memory) and mass storagedevices (e.g. recording discs or RAIDs) available on the serialcomputing platform used to implement such prior art MOLAP systems;

FIG. 5 is a graphical representation showing how search time in aconventional (i.e. prior art) MDDB increases in proportion to the amountof preaggregation of data therewithin;

FIG. 6A is a schematic representation of a generalized embodiment of amultidimensional on-line analytical processing (MOLAP) system of thepresent invention comprising a Data Warehouse realized as a relationaldatabase, a stand-alone Aggregration Server of the present inventionhaving an integrated aggregation engine and MDDB, and an OLAP serversupporting a plurality of OLAP clients, wherein the stand-aloneAggregation Server performs aggregation functions (e.g. summation ofnumbers, as well as other mathematical operations, such asmultiplication, subtraction, division etc.) and multi-dimensional datastorage functions;

FIG. 6B is a schematic block diagram of the stand-alone AggregationServer of the illustrative embodiment shown in FIG. 6A, showing itsprimary components, namely, a base data interface (e.g. OLDB, OLE-DB,ODBC, SQL, API, JDBC, etc.) for receiving RDBMS flat files lists andother files from the Data Warehouse (RDBMS), a base data loader forreceiving base data from the base data interface, configuration managerfor managing the operation of the base data interface and base dataloader, an aggregation engine for receiving base data from the baseloader, a multi-dimensional database (MDDB), a MDDB handler for handlingthe movement of base data and aggregation data between the aggregationengine and the MDDB, an input analyzer, an aggregation client interface(e.g. OLDB, OLE-DB, ODBC, SQL, API, JDBC, etc.) for receiving requestsfrom clients (such as an OLAP Server, spreadsheet application, end-userapplication) and receiving data files from the input analyzer fortransfer to requesting clients, and a configuration manager for managingthe operation of the input analyzer and the aggregation clientinterface;

FIG. 6C is a schematic representation of the software modules comprisingthe aggregation engine and MDDB handler of the stand-alone AggregationServer of the illustrative embodiment of the present invention, showinga base data list structure being supplied to a hierarchy analysis andreorder module, the output thereof being transferred to an aggregationmanagement module, the output thereof being transferred to a storagemodule via a storage management module, and a Query Directed Roll-up(QDR) aggregation management module being provided for receivingdatabase (DB) requests from OLAP client machines and managing theoperation of the aggregation and storage management modules of thepresent invention;

FIG. 6D is a flow chart representation of the primary operations carriedout by the (DB) request serving mechanism within the QDR aggregationmanagement module shown in FIG. 6C;

FIG. 6E is a schematic representation of a generalized embodiment of amultidimensional on-line analytical processing (MOLAP) system of thepresent invention comprising a Data Warehouse realized as a relationaldatabase, a stand-alone Aggregration Server of the present invention,having an integrated aggregation engine and MDDB, that is capable ofbeing plugged into (i.e., interfaced to) OLAP servers (two shown) ofdifferent users or vendors each supporting a plurality of OLAP clients;the stand-alone Aggregation Server performs aggregation functions (e.g.summation of numbers, as well as other mathematical operations, such asmultiplication, subtraction, division etc.) and multi-dimensional datastorage functions for such OLAP Server(s);

FIG. 7A is a schematic representation of a separate-platform typeimplementation of the stand-alone Aggregation Server of the illustrativeembodiment of FIG. 6B and a conventional OLAP server supporting aplurality of client machines, wherein base data from a Data Warehouse isshown being received by the aggregation server, realized on a firsthardware/software platform (i.e. Platform A) and the stand-aloneAggregation Server is shown serving the conventional OLAP server,realized on a second hardware/software platform (i.e. Platform B), aswell as serving data aggregation requirements of other clientssupporting diverse applications such as spreadsheet, GUI front end, andapplications;

FIG. 7B is a schematic representation of a shared-platform typeimplementation of the stand-alone Aggregation Server of the illustrativeembodiment of FIG. 6B and a conventional OLAP server supporting aplurality of client machines, wherein base data from a Data Warehouse isshown being received by the stand-alone Aggregation Server, realized ona common hardware/software platform and the aggregation server is shownserving the conventional OLAP server, realized on the same commonhardware/software platform, as well as serving data aggregationrequirements of other clients supporting diverse applications such asspreadsheet, GUI front end, and applications;

FIG. 8A is a data table setting forth information representative ofperformance benchmarks obtained by the shared-platform typeimplementation of the stand-alone Aggregation Server of the illustrativeembodiment serving the conventional OLAP server (i.e. Oracle EXPRESSServer) shown in FIG. 7B, wherein the common hardware/software platformis realized using a Pentium II 450 Mhz, 1 GB RAM, 18 GB Disk, runningthe Microsoft NT operating system (OS);

FIG. 9A is a schematic representation of the first stage in the methodof segmented aggregation according to the principles of the presentinvention, showing initial aggregration along the 1 st dimension;

FIG. 9B is a schematic representation of the next stage in the method ofsegmented aggregation according to the principles of the presentinvention, showing that any segment along dimension 1, such as the shownslice, can be separately aggregated along the remaining dimensions, 2and 3, and that in general, for an N dimensional system, the secondstage involves aggregation in N−1 dimensions. The principle ofsegmentation can be applied on the first stage as well, however, only alarge enough data will justify such a sliced procedure in the firstdimension. Actually, it is possible to consider each segment as an N−1cube, enabling recursive computation.

FIG. 9C 1 is a schematic representation of the Query Directed Roll-up(QDR) aggregation method/procedure of the present invention, showingdata aggregation starting from existing basic data or previouslyaggregated data in the first dimension (D1), and such aggregated databeing utilized as a basis for QDR aggregation along the second dimension(D2);

FIG. 9C 2 is a schematic representation of the Query Directed Roll-up(QDR) aggregation method/procedure of the present invention, showinginitial data aggregation starting from existing previously aggregateddata in the second third (D3), and continuing along the third dimension(D3), and thereafter continuing aggregation along the second dimension(D2);

FIG. 10A is a schematic representation of the “slice-storage” method ofstoring sparse data in the disk storage devices of the MDDB of FIG. 6Bin accordance with the principles of the present invention, based on anascending-ordered index along aggregation direction, enabling fastretrieval of data;

FIG. 10B is a schematic representation of the data organization of datafiles and the directory file used in the storages of the MDDB of FIG.6B, and the method of searching for a queried data point therein using asimple binary search technique due to the data files ascending order;

FIG. 11A is a schematic representation of three exemplarymulti-hierarchical data structures for storage of data within the MDDBof FIG. 6B, having three levels of hierarchy, wherein the first levelrepresentative of base data is composed of items A, B, F, and G, thesecond level is composed of items C, E, H and I, and the third level iscomposed of a single item D, which is common to all three hierarchicalstructures;

FIG. 11B is a schematic representation of an optimizedmulti-hierarchical data structure merged from all three hierarchies ofFIG. 11A, in accordance with the principles of the present invention;

FIG. 12 is a schematic representation showing the levels of operationsperformed by the stand-alone Aggregation Server of FIG. 6B, summarizingthe different enabling components for carrying out the method ofsegmented aggregation in accordance with the principles of the presentinvention; and

FIG. 13 is a schematic representation of the stand-alone AggregationServer of the present invention shown as a component of a central datawarehouse, serving the data aggregation needs of URL directory systems,Data Marts, RDBMSs, ROLAP systems and OLAP systems alike.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS OF THE PRESENTINVENTION

Referring now to FIGS. 6A through 13, the preferred embodiments of themethod and system of the present invention will be now described ingreat detail hereinbelow, wherein like elements in the Drawings shall beindicated by like reference numerals.

Through this invention disclosure, the term “aggregation” and“preaggregation” shall be understood to mean the process of summation ofnumbers, as well as other mathematical operations, such asmultiplication, subtraction, division etc.

In general, the stand-alone aggregation server and methods of andapparatus for data aggregation of the present invention can be employedin a wide range of applications, including MOLAP systems, ROLAP systems,Internet URL-directory systems, personalized on-line e-commerce shoppingsystems, Internet-based systems requiring real-time control of packetrouting and/or switching, and the like.

For purposes of illustration, initial focus will be accorded toimprovements in MOLAP systems, in which knowledge workers are enabled tointuitively, quickly, and flexibly manipulate operational data within aMDDB using familiar business terms in order to provide analyticalinsight into a business domain of interest.

FIG. 6A illustrates a generalized embodiment of a multidimensionalon-line analytical processing (MOLAP) system of the present inventioncomprising: a Data Warehouse 601 realized as a relational database; astand-alone cartridge-style Aggregation Server 603 of the presentinvention having an integrated aggregation engine and a MDDB; and anOLAP server 605 communicating with the Aggregation Server 603, andsupporting a plurality of OLAP clients. In accordance with theprinciples of the present invention, the stand-alone Aggregation Server603 performs aggregation functions (e.g. summation of numbers, as wellas other mathematical operations, such as multiplication, subtraction,division etc.) and multi-dimensional data storage functions. Departingfrom conventional practices, the principles of the present inventionteaches moving the aggregation engine and the MDDB into a separateAggregation Server 603 having standardized interfaces so that it can beplugged-into the OLAP system of virtually any user or vendor. Thisfeature is illustrated in FIG. 6E wherein the Aggregation Server 603 canbe plugged into (e.g., interfaced to) OLAP Servers (two shown as 605′and 605″) of different users or vendors. As shown, the AggregationServer 603 is operably plugged into (e.g., interfaced to) OLAP Server605′ of one user or vendor, yet it is also capable of being operablyplugged into OLAP server 605″ of another user or vendor, as indicated bythe dotted lines. This dramatic move discontinues the restrictingdependency of aggregation from the analytical functions of OLAP, and byapplying novel and independent algorithms. The stand-alone dataaggregation server enables efficient organization and handling of data,fast aggregation processing, and fast access to and retrieval of anydata element in the MDDB.

a, fast aggregation processing, and fast access to and retrieval of anydata element in the MDDB.

As will be described in greater detail hereinafter, the AggregationServer 603 of the present invention can serve the data aggregationrequirements of other types of systems besides OLAP systems such as, forexample, URL directory management Data Marts, RDBMS, or ROLAP.

The Aggregation Server 603 of the present invention excels in performingtwo distinct functions, namely: the aggregation of data in the MDDB; andthe handling of the resulting data base in the MDDB, for “on demand”client use. In the case of serving an OLAP system, the AggregationServer 603 of the present invention focuses on performing these twofunctions in a high performance manner (i.e. aggregating and storingbase data, originated at the Data Warehouse, in a multidimensionalstorage (MDDB), and providing the results of this data aggregationprocess ion demand to the clients, such as the OLAP server 605,spreadsheet applications, the end user applications. As such, theAggregation Server 603 of the present invention frees each conventionalOLAP server 605, with which it interfaces, from the need of making dataaggregations, and therefore allows the conventional OLAP server 605 toconcentrate on the primary functions of OLAP servers, namely: dataanalysis and supporting a graphical interface with the user client.

FIG. 6B shows the primary components of the stand-alone AggregationServer 603 of the illustrative embodiment, namely: a base data interface611 (e.g. OLDB, OLE-DB, ODBC, SQL, API, JDBC, etc.) for receiving RDBMSflat files lists and other files from the Data Warehouse (RDBMS), a basedata loader 612 for receiving base data from the base data interface611, configuration manager 613 for managing the operation of the basedata interface 611 and base data loader 612, an aggregation engine 621for receiving base data from the base loader 612, a multi-dimensionaldatabase (MDDB) 625; a MDDB handler 623, an input analyzer 627, anaggregation client interface 629 (e.g. OLDB, OLE-DB, ODBC, SQL, API,JDBC, etc.) and a configuration manager 631 for managing the operationof the input analyzer 627 and the aggregation client interface 629.

During operation, the base data originates at data warehouse or othersources, such as external ASCII files, MOLAP server, or others. TheConfiguration Manager 613, in order to enable proper communication withall possible sources and data structures, configures two blocks, theBase Data Interface 611 and Data Loader 612. Their configuration ismatched with different standards such as OLDB, OLE-DB, ODBC, SQL, API,JDBC, etc.

As shown in FIG. 6B, the core of the data Aggregation Server 603 of thepresent invention comprises: a data Aggregation Engine 621; a MDDBHandler 623; and a Multidimensional Database (MDDB) 625. The results ofdata aggregation are efficiently stored in a multidimensional structurewithin the Multidimensional Database (MDDB) 625, by the MDDB Handler623.

As shown in FIGS. 6A and 6B, the stand-alone Aggregation Server 603 ofthe present invention serves the OLAP Server 605 via standardinterfaces, such as OLDB, OLE-DB, ODBC, SQL, API, JDBC, etc. Aggregationresults required by the OLAP Server 605 are supplied on demand.Typically, the OLAP Server 605 disintegrates the query, via parsingprocess, into series of requests. Each such request, specifying an-dimensional coordinate, is presented to the Aggregation Server 603 forthe coordinate's value. The Configuration Manager 631 sets theAggregation Client Interface 629 and Input Analyzer 627 for a propercommunication protocol according to the client user (e.g, OLAP Server605). The Input Analyzer 627 converts the input format to make itsuitable for the MDDB Handler 623.

An object of the present invention is to make the transfer of datacompletely transparent to the OLAP user, in a manner which is equivalentto the storing of data in the cache of the OLAP server 605 and withoutany query delays. This requires that the stand-alone Aggregation Server603 have exceptionally fast response characteristics. This object isenabled by providing the unique data structure and aggregation mechanismof the present invention.

FIG. 6C shows the software modules comprising the aggregation engine andMDDB handler components 615 of the stand-alone Aggregation Server 603 ofthe illustrative embodiment. The base data list, as it arrives fromRDBMS or text files, has to be analyzed and reordered to optimizehierarchy handling, according to the unique method of the presentinvention, as described later with reference to FIGS. 11A and 11B.

The function of the aggregation management module is to administrate theaggregation process according to the method illustrated in FIGS. 9A and9B.

In accordance with the principles of the present invention, dataaggregation within the stand-alone Aggregation Server 603 can be carriedout either as a complete pre-aggregation process, where the base data isfully aggregated before commencing querying, or as a query directedroll-up (QDR) process, where querying is allowed at any stage ofaggregation using the “on-the-fly” data aggregation process of thepresent invention. The QDR process will be described hereinafter ingreater detail with reference to FIG. 9C. The response to a request(i.e. a basic component of a client query), by calling the Aggregationmanagement module for “on-the-fly” data aggregation, or for accessingpre-aggregated result data via the MDDB Handler module. Thequery/request serving mechanism of the present invention within the QDRaggregation management module is illustrated in the flow chart of FIG.6D.

The function of the MDDB Handler (i.e., “management”) module is tohandle multidimensional data in the MDDB 625 in a very efficient way,according to the novel method of the present invention, which will bedescribed in detail hereinafter with reference to FIGS. 10A and 10B.

The request serving mechanism shown in FIG. 6D is controlled by the QDRaggregation management module. Requests are queued and served one byone. If the required data is already pre-calculated, then it isretrieved by the MDDB Handler module and returned to the client (e.g.,OLAP Server 605). Otherwise, the required data is calculated“on-the-fly” by the aggregation management module, and the result movedout to the client (e.g., OLAP Server 605), while simultaneously storedby the MDDB Handler module, shown in FIG. 6C.

FIGS. 7A and 7B outline two different implementations of the stand-alone(cartridge-style) Aggregation Server 603 of the present invention. Inboth implementations, the Aggregation Server 603 supplies aggregatedMDDB results to a client (e.g., OLAP Server 605).

FIG. 7A shows a separate-platform implementation of the MOLAP system ofthe illustrative embodiment shown in FIG. 6A, wherein the AggregationServer 603 of the present invention resides on a separate hardwareplatform and OS system from that used to run the OLAP server 605. Inthis type of implementation, it is even possible to run the AggregationServer 603 and the OLAP Server 605 on different-type operating systems(e.g. NT, Unix, MAC OS).

FIG. 7B shows a common-platform implementation of the MOLAP system ofthe illustrative embodiment shown in FIG. 6B, wherein the AggregationServer 603 of the present invention shares the same hardware platformand operating system (OS) that used to run the client OLAP Server 605.

FIG. 8A shows a table setting forth the benchmark results of anaggregation engine, implemented on a shared/common hardware platform andOS, in accordance with the principles of the present invention. Thecommon platform and OS is realized using a Pentium II 450 Mhz, 1 GB RAM,18 GB Disk, running the Microsoft NT operating system. The six (6) datasets shown in the table differ in number of dimensions, number ofhierarchies, measure of sparcity and data size. A comparison with ORACLEExpress, a major OLAP server, is made. It is evident that theaggregation engine of the present invention outperforms currentlyleading aggregation technology by more than an order of magnitude.□

The segmented data aggregation method of the present invention isdescribed in FIGS. 9A through 9C2. These figures outline a simplifiedsetting of three dimensions only; however, the following analysisapplies to any number of dimensions as well.

The data is being divided into autonomic segments to minimize the amountof simultaneously handled data. The initial aggregation is practiced ona single dimension only, while later on the aggregation process involvesall other dimensions.

At the first stage of the aggregation method, an aggregation isperformed along dimension 1. The first stage can be performed on morethan one dimension. As shown in FIG. 9A, the space of the base data isexpanded by the aggregation process.

In the next stage shown in FIG. 9B, any segment along dimension 1, suchas the shown slice, can be separately aggregated along the remainingdimensions, 2 and 3. In general, for an N dimensional system, the secondstage involves aggregation in N−1 dimensions.

The principle of data segmentation can be applied on the first stage aswell. However, only a large enough data set will justify such a slicedprocedure in the first dimension. Actually, it is possible to considereach segment as an N−1 cube, enabling recursive computation.

It is imperative to get aggregation results of a specific slice beforethe entire aggregation is completed, or alternatively, to have theroll-up done in a particular sequence. This novel feature of theaggregation method of the present invention is that it allows thequerying to begin, even before the regular aggregation process isaccomplished, and still having fast response. Moreover, in relationalOLAP and other systems requiring only partial aggregations, the QDRprocess dramatically speeds up the query response.

The QDR process is made feasible by the slice-oriented roll-up method ofthe present invention. □ After aggregating the first dimension(s), themultidimensional space is composed of independent multidimensional cubes(slices). These cubes can be processed in any arbitrary sequence.

Consequently the aggregation process of the present invention can bemonitored by means of files, shared memory sockets, or queues tostatically or dynamically set the roll-up order.

In order to satisfy a single query coming from a client, before therequired aggregation result has been prepared, the QDR process of thepresent invention involves performing a fast on-the-fly aggregation(roll-up) involving only a thin slice of the multidimensional data.

FIG. 9C 1 shows a slice required for building-up a roll-up result of the2^(nd) dimension. In case 1, as shown, the aggregation starts from anexisting data, either basic or previously aggregated in the firstdimension. This data is utilized as a basis for QDR aggregation alongthe second dimension. In case 2, due to lack of previous data, a QDRinvolves an initial slice aggregation along dimension 3, and thereafteraggregation along the 2^(nd) dimension.

FIG. 9C 2 shows two corresponding QDR cases for gaining results in the3d dimension. Cases 1 and 2 differ in the amount of initial aggregationrequired in 2^(nd) dimension.

FIG. 10A illustrates the “Slice-Storage” method of storing sparse dataon storage disks. In general, this data storage method is based on theprinciple that an ascending-ordered index along aggregation direction,enables fast retrieval of data. FIG. 10A illustrates a unit-wide sliceof the multidimensional cube of data. Since the data is sparse, only fewnon-NA data points exist. These points are indexed as follows. The DataFile consists of data records, in which each n−1 dimensional slice isbeing stored, in a separate record. These records have a varying length,according to the amount of non-NA stored points. For each registeredpoint in the record, IND_(k) stands for an index in a n-dimensionalcube, and Data stands for the value of a given point in the cube.

FIG. 10B illustrates a novel method for randomly searching for a querieddata point in the MDDB of FIG. 6B by using a novel technique oforganizing data files and the directory file used in the storages of theMDDB, so that a simple binary search technique can then be employedwithin the Aggregation Server of the present invention. According tothis method, a metafile termed DIR File, keeps pointers to Data Files aswell as additional parameters such as the start and end addresses ofdata record (IND₀, IND_(n)), its location within the Data File, recordsize (n), file's physical address on disk (D_Path), and auxiliaryinformation on the record (Flags).

A search for a queried data point is then performed by an access to theDIR file. The search along the file can be made using a simple binarysearch due to file's ascending order. When the record is found, it isthen loaded into main memory to search for the required point,characterized by its index IND_(k). The attached Data field representsthe queried value. In case the exact index is not found, it means thatthe point is a NA.

FIGS. 11A and 11B illustrate a novel method for pre-processing data suchthat multi-hierarchies in multi-hierarchical structures are optimallymerged.

In particular, FIG. 11A illustrates a novel method which the stand-aloneAggregation Server employs for handling hierarchies. According to thedevised method, the inner order of hierarchies within a dimension isoptimized, to achieve efficient data handling for summations and othermathematical formulas (termed in general “Aggregation”). The order ofhierarchy is defined externally. It is brought from a data source to thestand-alone aggregation engine, as a descriptor of data, before the dataitself. In the illustrative embodiment, the method assumes hierarchicalrelations of the data, as shown in FIG. 11A. The way data items areordered in the memory space of the Aggregation Server, with regard tothe hierarchy, has a significant impact on its data handling efficiency.

Notably, when using prior art techniques, multiple handling of dataelements, which occurs when a data element is accessed more than onceduring aggregation process, has been hitherto unavoidable when the mainconcern is to effectively handle the sparse data. The data structuresused in prior art data handling methods have been designed for fastaccess to a non NA data. According to prior art techniques, each accessis associated with a timely search and retrieval in the data structure.For the massive amount of data typically accessed from a Data Warehousein an OLAP application, such multiple handling of data elements hassignificantly degraded the efficiency of prior art data aggregationprocesses. When using prior art data handling techniques, the dataelement D shown in FIG. 11A must be accessed three times, causing pooraggregation performance.

In accordance with the data handling method of the present, the data isbeing pre-ordered for a singular handling, as opposed to multiplehandling taught by prior art methods. According to the presentinvention, elements of base data and their aggregated results arecontiguously stored in a way that each element will be accessed onlyonce. This particular order allows a forward-only handling, neverbackward. Once a base data element is stored, or aggregated result isgenerated and stored, it is never to be retrieved again for furtheraggregation. As a result the storage access is minimized. This way ofsingular handling greatly elevates the aggregation efficiency of largedata bases. An efficient handling method as used in the presentinvention, is shown in FIG. 7A. The data element D, as any otherelement, is accessed and handled only once.

FIG. 11A shows an example of a multi-hierarchical database structurehaving 3 hierarchies. As shown, the base data includes the items A, B,F, and G. The second level is composed of items C, E, H and I. The thirdlevel has a single item D, which is common to all three hierarchicalstructures. In accordance with the method of the present invention, aminimal computing path is always taken. For example, according to themethod of the present invention, item D will be calculated as part ofstructure 1, requiring two mathematical operations only, rather than asin structure 3, which would need four mathematical operations. FIG. 11Bdepicts an optimized structure merged from all three hierarchies.

FIG. 12 summarizes the different enabling components for segmentedaggregation. The minimized operations in handling multi-hierarchies needanalysis of the base data. It greatly optimizes data handling andcontribute to aggregation speed. Based on this technology loading andindexing operations become very efficient, minimizing memory and storageaccess, and speeding up storing and retrieval operations. On top of allthe enabling technologies is the segmented aggregation technique, notjust outperforming by orders of magnitude the prior-art aggregationalgorithms, but also enabling the unique QDR which waves out the need ofwaiting for full pre-aggregation.

FIG. 13 shows the stand-alone Aggregation Server of the presentinvention as a component of a central data warehouse, serving the dataaggregation needs of URL directory systems, Data Marts, RDBMSs, ROLAPsystems and OLAP systems alike.

The reason for the central multidimensional database's rise to corporatenecessity is that it facilitates flexible, high-performance access andanalysis of large volumes of complex and interrelated data.

A stand-alone specialized aggregation server, simultaneously servingmany different kinds of clients (e.g. data mart, OLAP, URL, RDBMS), hasthe power of delivering an enterprise-wide aggregation in acost-effective way. This kind of server eliminates the roll-upredundancy over the group of clients, delivering scalability andflexibility.

Performance associated with central data warehouse is an importantconsideration in the overall approach. Performance includes aggregationtimes and query response.

Effective interactive query applications require near real-timeperformance, measured in seconds. These application performancestranslate directly into the aggregation requirements.

In the prior art, in case of MOLAP, a full pre-aggregation must be donebefore starting querying. In the present invention, in contrast to priorart, the query directed roll-up (QDR) allows instant querying, while thefull pre-aggregation is done in the background. In cases a fullpre-aggregation is preferred, the currently invented aggregationoutperforms any prior art. For the ROLAP and RDBMS clients, partialaggregations maximize query performance. In both cases fast aggregationprocess is imperative. The aggregation performance of the currentinvention is by orders of magnitude higher than that of the prior art.

The stand-alone scalable aggregation server of the present invention canbe used in any MOLAP system environment for answering questions aboutcorporate performance in a particular market, economic trends, consumerbehaviors, weather conditions, population trends, or the state of anyphysical, social, biological or other system or phenomenon on whichdifferent types or categories of information, organizable in accordancewith a predetermined dimensional hierarchy, are collected and storedwithin a RDBMS of one sort or another. Regardless of the particularapplication selected, the address data mapping processes of the presentinvention will provide a quick and efficient way of managing a MDDB andalso enabling decision support capabilities utilizing the same indiverse application environments.

Functional Advantages Gained by the Data Aggregation Server of thePresent Invention

The stand-alone “cartridge-style” plug-in features of the dataaggregation server of the present invention, provides freedom indesigning an optimized multidimensional data structure and handlingmethod for aggregation, provides freedom in designing a genericaggregation server matching all OLAP vendors, and enablesenterprise-wide centralized aggregation.

The method of Segmented Aggregation employed in the aggregation serverof the present invention provides flexibility, scalability, a conditionfor Query Directed Aggregation, and speed improvement.

The method of Multidimensional data organization and indexing employedin the aggregation server of the present invention provides fast storageand retrieval, a condition for Segmented Aggregation, improves thestoring, handling, and retrieval of data in a fast manner, andcontributes to structural flexibility to allow sliced aggregation andQDR. It also enables the forwarding and single handling of data withimprovements in speed performance.

The method of Query Directed Aggregation (QDR) employed in theaggregation server of the present invention minimizes the data handlingoperations in multi-hierarchy data structures.

The method of Query Directed Aggregation (QDR) employed in theaggregation server of the present invention eliminates the need to waitfor full aggregation to be completed, and provides build-up aggregateddata required for full aggregation.

It is understood that the System and Method of the illustrativeembodiments described hereinabove may be modified in a variety of wayswhich will become readily apparent to those skilled in the art of havingthe benefit of the novel teachings disclosed herein. All suchmodifications and variations of the illustrative embodiments thereofshall be deemed to be within the scope and spirit of the presentinvention as defined by the Claims to Invention appended hereto.

1. A data aggregation module for interfacing with a relational databasemanagement subsystem (RDBMS) having a relational data store includingone or more fact tables for storing at least the base data, and ameta-data store for storing a dictionary containing dimension data, andservicing database query requests provided by one or more clientmachines, said data aggregation module comprising: a multi-dimensionaldatabase (MDB) having a multi-dimensional data storage structureincluding a plurality of data storage cells, wherein each said datastorage cell is indexed with multiple dimensions and stores either abase data value or an aggregated multi-dimensional data value; a basedata loader for loading said directory data and said base data to saidMDD aggregation module; a first data communication interface operablyconnected between said base loader and said RDBMS, for transmittingdirectory data and base data from said RDBMS to said aggregation module;an aggregation engine for receiving base data from said base dataloader, and calculating aggregated multi-dimensional data from the basedata according to a multi-dimensional data aggregation process; a MDDBhandler for storing aggregated multi-dimensional data in said MDDB andretrieving aggregated multi-dimensional data from said MDDB; a seconddata communication interface operably connected to said data aggregationengine and said one or more client machines, for receiving databasequery requests from said client machines; a database query handleroperably connected to said second data communication interface, forhandling said database query requests; a controller for managing theoperation of said base data loader, said aggregation engine, said MDDBhandler, and said database query handler; wherein after base dataloading operations, the base data is initially aggregated along at leastone of the dimensions extracted from said dictionary and aggregatedmulti-dimensional data results are stored in said MDDB; and whereinduring query servicing operations, each database query request isreceived over said second data communication interface, (ii) the set ofdimensions associated with said database query request is extracted,(iii) said dimensions are provided as multi-dimensional coordinates tosaid MDDB handler so that (i) said MDDB handler can retrieve aggregatedmulti-dimensional data from data storage cell(s) in said MDDB specifiedby said multi-dimensional coordinates, and/or (ii) said aggregationengine can dynamically generate, on demand, aggregated multi-dimensionaldata for storage at data storage cell(s) in said MDDB specified by saidmulti-dimensional coordinates, and (iv) returning the retrievedaggregated multi-dimensional data back to said database query handler,over said second data communication interface, for transmission to saidclient machine for servicing the database query request.
 2. The dataaggregation module of claim 1, wherein during data loading operations,said base data loader extracts dimension data from the dictionary insaid meta-data store, and forwards the dimension data over said firstdata communication interface to said aggregation engine and configuressaid MDDB using said dimension data so that each data storage celltherein is specified by a set of multi-dimensional coordinates.
 3. Thedata aggregation module of claim 2, wherein during aggregationoperations, said base data loader extracts base data from said table(s),and forwards the base data to said aggregation engine over said firstdata communication interface; and said aggregation engine calculatesaggregated multi-dimensional data from the base data according to saidmulti-dimensional data aggregation process, and stores the aggregatedmulti-dimensional data within data storage cell(s) in said MDDBspecified by a set of multi-dimensional coordinates.
 4. The dataaggregation module of claim 1, wherein said first data communicationinterface is a standard interface selected from the group selected fromOLDB, OLE-DB, ODBC, SQL, API, and JDBC.
 5. The data aggregation moduleof claim 1, wherein said second data communication interface is astandard interface selected from the group selected from OLDB, OLE-DB,ODBC, SQL, API, and JDBC.
 6. The data aggregation module of claim 1,wherein said query interface is implemented in a module within saidRDBMS.
 7. The data aggregation module of claim 1, wherein said base dataloader further includes an adapter (interface) for mapping the datatypes of said dictionary into the data types used in said MDDaggregation module, or for mapping a standard data type used torepresent said dictionary into the data types used in said MDDaggregation module.
 8. The data aggregation module of claim 1, whereinsaid database query handler further includes means for transforming eachdatabase query requests so as to optimize the database query requestsfor efficient database query request handling.
 9. The data aggregationmodule of claim 1, wherein said multi-dimensional aggregation processsupported by said aggregation engine further operates as follows: (a) ifthe aggregated multi-dimensional data required to answer a givendatabase query requests is already pre-calculated and stored within saidMDDB, then the pre-aggregated multi-dimensional data is retrieved bysaid MDDB handler and returned to said client machine via said databasequery handler; and (b) if the required multi-dimensional data is notalready pre-aggregated and stored within said MDDB, then the requiredaggregated multi-dimensional data is calculated on demand by saidaggregation engine, and the aggregated multi-dimensional data result isautomatically transferred to said client machine, via said databasequery handler, while being simultaneously stored within said MDDB bysaid MDDB handler.
 10. The data aggregation module of claim 1, whereinsaid first and second data communication interfaces are standardinterfaces enabling cartridge-style interfacing with said RDBMS and saidone or more client machines.
 11. The data aggregation module of claim 1,wherein said client machines comprises one or more OLAP servers.